Creating a Spontaneous Conversational Speech Corpus

نویسندگان

  • Maria Husin
  • Darryl Stewart
  • Ji Ming
  • Francis Jack Smith
چکیده

Speech recognition and language analysis of spontaneous speech arising in naturally spoken conversations are becoming the subject of much research. However, there is a shortage of spontaneous speech corpora that are freely available for academics. We therefore undertook the building of a natural conversation speech database, recording over 200 hours of conversations in English by over 600 local university students. With few exceptions, the students used their own cell phones from their own rooms or homes to speak to one another, and they were permitted to speak on any topic they chose. Although they knew that they were being recorded and that they would receive a small payment, their conversations in the corpus are probably very close to being natural and spontaneous. This paper describes a detailed case study of the problems we faced and the methods we used to make the recordings and control the collection of these social science data on a limited budget.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating a speech corpus with semi-spontaneous, parallel conversational and clear speech Tech Report: CSLU-11-003

Our goal is to collect a speech corpus for the purpose of studying intelligibility and acoustic differences between the conversational and clear speech styles. The ideal corpus has the following properties: (1) speech has been produced spontaneously as part of a communicative interaction, as opposed to having been read to an imagined interlocutor; (2) entire identical utterances, or large parts...

متن کامل

Pronunciation variant analysis using speaking style parallel corpus

To improve the recognition accuracy for spontaneous conversational speech, we collected a corpus to study how spontaneous conversational speech differs from read style speech. The corpus consists of two parts: 1) spontaneous conversational speech and 2) read speech with the same word transcriptions as the conversational speech. In word and phone recognition experiments, it was confirmed that, f...

متن کامل

Acoustic variability in spontaneous conversational speech of american English talkers

Speaker variability strongly impacts human perception and technology performance, yet large-scale, systematic study of the acoustic characteristics involved is rarely undertaken. This study provides statistics on selected segmental and suprasegmental acoustic parameters from measures made on spontaneous conversational telephone speech from 160 speakers in the Switchboard Corpus. Since spontaneo...

متن کامل

Linguistic Resources for Reconstructing Spontaneous Speech Text

The output of a speech recognition system is not always ideal for subsequent downstream processing, in part because speakers themselves often make mistakes. A system would accomplish speech reconstruction of its spontaneous speech input if its output were to represent, in flawless, fluent, and content-preserving English, the message that the speaker intended to convey. These cleaner speech tran...

متن کامل

CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech

This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Data Science Journal

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2011